Skip to content

fix: reduce driver/base import overhead#1615

Open
akashmalbari wants to merge 3 commits into
apache:mainfrom
akashmalbari:1246-import-overhead
Open

fix: reduce driver/base import overhead#1615
akashmalbari wants to merge 3 commits into
apache:mainfrom
akashmalbari:1246-import-overhead

Conversation

@akashmalbari
Copy link
Copy Markdown

Closes #1246.

This PR reduces import-time overhead for from hamilton import driver, base by deferring import-time work that is not needed during the initial driver/base import path.

Changes

  • Profiled from hamilton import driver, base using python -X importtime and cProfile.
  • Identified unnecessary import-time overhead in the driver/base import path.
  • Deferred the relevant import work so it only happens when needed.
  • Preserved existing public import behavior for:
    • from hamilton import driver, base
    • from hamilton.driver import Driver
    • from hamilton import base

How I tested this

Import smoke test:

python -c "from hamilton import driver, base; print(driver.Driver); print(base.DictResult)"

## Profiling

Command used:

```bash
python -X importtime -c "from hamilton import driver, base"
python -c "import cProfile; cProfile.run('from hamilton import driver, base', '/tmp/hamilton_import_before.prof')"
python -c "import cProfile; cProfile.run('from hamilton import driver, base', '/tmp/hamilton_import_after.prof')"

Before: 1.068s importtime, 0.894s cProfile.
After: 0.111s importtime, 0.188s cProfile.

Copy link
Copy Markdown
Member

@ArnavBalyan ArnavBalyan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lgtm thanks for the change, some minor comments. The perf improvement looks great!

Comment thread tests/test_imports.py Outdated
# under the License.


def test_driver_base_public_imports():
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test does not cover future regressions, should we add an assert to ensure new code cannot eagerly load the dependency.

  def test_driver_import_does_not_load_heavy_modules():
      import sys
      for mod in ("pandas", "numpy"):
          sys.modules.pop(mod, None)
      import hamilton.driver 
      for mod in ("pandas", "numpy"):
          assert mod not in sys.modules, f"{mod} was loaded eagerly"

Along the above lines

Comment thread hamilton/base.py Outdated
logger = logging.getLogger(__name__)


def _get_pandas():
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are many _get_dependency functions hard to track/standardize in the future, can we put them behind a shared helper.

Along the following lines:

  @cache
  def _lazy_import(module: str, attr: str | None = None):
      """Lazy import a module. Cached after 1st call."""
      mod = importlib.import_module(module)
      return mod if attr is None else getattr(mod, attr)


  pd  = _lazy_import("pandas")
  np  = _lazy_import("numpy")
  pde = _lazy_import("pandas.core.indexes.extension")

@akashmalbari akashmalbari force-pushed the 1246-import-overhead branch from c61fe33 to 165d06e Compare June 1, 2026 13:43
@akashmalbari
Copy link
Copy Markdown
Author

Thanks for the review! I addressed both comments and resolved the latest conflict with main.

  • Added a subprocess-based regression test to ensure importing hamilton.driver does not eagerly load pandas or numpy.
  • Replaced the ad-hoc lazy dependency helpers in hamilton/base.py with a shared cached _lazy_import(...) helper.
  • Resolved the hamilton/driver.py conflict by keeping both with_data_quality_disabled(...) from main and the lazy/string annotation for with_materializers(...).

Validation run:

  • .venv/bin/ruff check hamilton/driver.py hamilton/base.py tests/test_imports.py
  • .venv/bin/python -m pytest tests/test_imports.py -q
  • Import smoke check for from hamilton import base, driver

I also tried the relevant upstream data-quality tests locally, but this environment could not collect them cleanly because extension loading hit the local xgboost / missing libomp.dylib issue.

Copy link
Copy Markdown
Member

@ArnavBalyan ArnavBalyan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, cc @skrawcz

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[ergonomics] profile driver and base module imports

2 participants